Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
medRxiv ; 2023 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-37398384

RESUMO

Introduction: Drug repurposing involves finding new therapeutic uses for already approved drugs, which can save costs as their pharmacokinetics and pharmacodynamics are already known. Predicting efficacy based on clinical endpoints is valuable for designing phase 3 trials and making Go/No-Go decisions, given the potential for confounding effects in phase 2. Objectives: This study aims to predict the efficacy of the repurposed Heart Failure (HF) drugs for the Phase 3 Clinical Trial. Methods: Our study presents a comprehensive framework for predicting drug efficacy in phase 3 trials, which combines drug-target prediction using biomedical knowledgebases with statistical analysis of real-world data. We developed a novel drug-target prediction model that uses low-dimensional representations of drug chemical structures and gene sequences, and biomedical knowledgebase. Furthermore, we conducted statistical analyses of electronic health records to assess the effectiveness of repurposed drugs in relation to clinical measurements (e.g., NT-proBNP). Results: We identified 24 repurposed drugs (9 with a positive effect and 15 with a non-positive) for heart failure from 266 phase 3 clinical trials. We used 25 genes related to heart failure for drug-target prediction, as well as electronic health records (EHR) from the Mayo Clinic for screening, which contained over 58,000 heart failure patients treated with various drugs and categorized by heart failure subtypes. Our proposed drug-target predictive model performed exceptionally well in all seven tests in the BETA benchmark compared to the six cutting-edge baseline methods (i.e., best performed in 266 out of 404 tasks). For the overall prediction of the 24 drugs, our model achieved an AUCROC of 82.59% and PRAUC (average precision) of 73.39%. Conclusion: The study demonstrated exceptional results in predicting the efficacy of repurposed drugs for phase 3 clinical trials, highlighting the potential of this method to facilitate computational drug repurposing.

2.
J Am Med Inform Assoc ; 30(10): 1645-1656, 2023 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-37463858

RESUMO

BACKGROUND: Alzheimer's disease (AD) is a progressive neurological disorder with no specific curative medications. Sophisticated clinical skills are crucial to optimize treatment regimens given the multiple coexisting comorbidities in the patient population. OBJECTIVE: Here, we propose a study to leverage reinforcement learning (RL) to learn the clinicians' decisions for AD patients based on the longitude data from electronic health records. METHODS: In this study, we selected 1736 patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We focused on the two most frequent concomitant diseases-depression, and hypertension, thus creating 5 data cohorts (ie, Whole Data, AD, AD-Hypertension, AD-Depression, and AD-Depression-Hypertension). We modeled the treatment learning into an RL problem by defining states, actions, and rewards. We built a regression model and decision tree to generate multiple states, used six combinations of medications (ie, cholinesterase inhibitors, memantine, memantine-cholinesterase inhibitors, hypertension drugs, supplements, or no drugs) as actions, and Mini-Mental State Exam (MMSE) scores as rewards. RESULTS: Given the proper dataset, the RL model can generate an optimal policy (regimen plan) that outperforms the clinician's treatment regimen. Optimal policies (ie, policy iteration and Q-learning) had lower rewards than the clinician's policy (mean -3.03 and -2.93 vs. -2.93, respectively) for smaller datasets but had higher rewards for larger datasets (mean -4.68 and -2.82 vs. -4.57, respectively). CONCLUSIONS: Our results highlight the potential of using RL to generate the optimal treatment based on the patients' longitude records. Our work can lead the path towards developing RL-based decision support systems that could help manage AD with comorbidities.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/tratamento farmacológico , Memantina/uso terapêutico , Inibidores da Colinesterase/uso terapêutico , Inteligência Artificial , Aprendizagem
3.
medRxiv ; 2023 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-37333219

RESUMO

Pharmacogenomics datasets have been generated for various purposes, such as investigating different biomarkers. However, when studying the same cell line with the same drugs, differences in drug responses exist between studies. These variations arise from factors such as inter-tumoral heterogeneity, experimental standardization, and the complexity of cell subtypes. Consequently, drug response prediction suffers from limited generalizability. To address these challenges, we propose a computational model based on Federated Learning (FL) for drug response prediction. By leveraging three pharmacogenomics datasets (CCLE, GDSC2, and gCSI), we evaluate the performance of our model across diverse cell line-based databases. Our results demonstrate superior predictive performance compared to baseline methods and traditional FL approaches through various experimental tests. This study underscores the potential of employing FL to leverage multiple data sources, enabling the development of generalized models that account for inconsistencies among pharmacogenomics datasets. By addressing the limitations of low generalizability, our approach contributes to advancing drug response prediction in precision oncology.

4.
medRxiv ; 2023 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-36747733

RESUMO

Background: Alzheimer's Disease (AD) is a progressive neurological disorder with no specific curative medications. While only a few medications are approved by FDA (i.e., donepezil, galantamine, rivastigmine, and memantine) to relieve symptoms (e.g., cognitive decline), sophisticated clinical skills are crucial to optimize the appropriate regimens given the multiple coexisting comorbidities in this patient population. Objective: Here, we propose a study to leverage reinforcement learning (RL) to learn the clinicians' decisions for AD patients based on the longitude records from Electronic Health Records (EHR). Methods: In this study, we withdraw 1,736 patients fulfilling our criteria, from the Alzheimer's Disease Neuroimaging Initiative(ADNI) database. We focused on the two most frequent concomitant diseases, depression, and hypertension, thus resulting in five main cohorts, 1) whole data, 2) AD-only, 3) AD-hypertension, 4) AD-depression, and 5) AD-hypertension-depression. We modeled the treatment learning into an RL problem by defining the three factors (i.e., states, action, and reward) in RL in multiple strategies, where a regression model and a decision tree are developed to generate states, six main medications extracted (i.e., no drugs, cholinesterase inhibitors, memantine, hypertension drugs, a combination of cholinesterase inhibitors and memantine, and supplements or other drugs) are for action, and Mini-Mental State Exam (MMSE) scores are for reward. Results: Given the proper dataset, the RL model can generate an optimal policy (regimen plan) that outperforms the clinician's treatment regimen. With the smallest data samples, the optimal-policy (i.e., policy iteration and Q-learning) gained a lesser reward than the clinician's policy (mean -2.68 and -2.76 vs . -2.66, respectively), but it gained more reward once the data size increased (mean -3.56 and -2.48 vs . -3.57, respectively). Conclusions: Our results highlight the potential of using RL to generate the optimal treatment based on the patients' longitude records. Our work can lead the path toward the development of RL-based decision support systems which could facilitate the daily practice to manage Alzheimer's disease with comorbidities.

5.
medRxiv ; 2023 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-36747787

RESUMO

Heart failure management is challenging due to the complex and heterogenous nature of its pathophysiology which makes the conventional treatments based on the "one size fits all" ideology not suitable. Coupling the longitudinal medical data with novel deep learning and network-based analytics will enable identifying the distinct patient phenotypic characteristics to help individualize the treatment regimen through the accurate prediction of the physiological response. In this study, we develop a graph representation learning framework that integrates the heterogeneous clinical events in the electronic health records (EHR) as graph format data, in which the patient-specific patterns and features are naturally infused for personalized predictions of lab test response. The framework includes a novel Graph Transformer Network that is equipped with a self-attention mechanism to model the underlying spatial interdependencies among the clinical events characterizing the cardiac physiological interactions in the heart failure treatment and a graph neural network (GNN) layer to incorporate the explicit temporality of each clinical event, that would help summarize the therapeutic effects induced on the physiological variables, and subsequently on the patient's health status as the heart failure condition progresses over time. We introduce a global attention mask that is computed based on event co-occurrences and is aggregated across all patient records to enhance the guidance of neighbor selection in graph representation learning. We test the feasibility of our model through detailed quantitative and qualitative evaluations on observational EHR data.

6.
Clin Transl Sci ; 16(3): 398-411, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36478394

RESUMO

An increasing number of studies have reported using natural language processing (NLP) to assist observational research by extracting clinical information from electronic health records (EHRs). Currently, no standardized reporting guidelines for NLP-assisted observational studies exist. The absence of detailed reporting guidelines may create ambiguity in the use of NLP-derived content, knowledge gaps in the current research reporting practices, and reproducibility challenges. To address these issues, we conducted a scoping review of NLP-assisted observational clinical studies and examined their reporting practices, focusing on NLP methodology and evaluation. Through our investigation, we discovered a high variation regarding the reporting practices, such as inconsistent use of references for measurement studies, variation in the reporting location (reference, appendix, and manuscript), and different granularity of NLP methodology and evaluation details. To promote the wide adoption and utilization of NLP solutions in clinical research, we outline several perspectives that align with the six principles released by the World Health Organization (WHO) that guide the ethical use of artificial intelligence for health.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes , Estudos Observacionais como Assunto
7.
J Biomed Inform ; 134: 104201, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36089199

RESUMO

BACKGROUND: Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based standards such as the Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) are increasingly used to represent and standardize EHR data for clinical data analytics, however, the potential of such a standard on building CKG has not been well investigated. OBJECTIVE: To develop and evaluate methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical KGs that are compliant with FHIR Resource Description Framework (RDF) specification. METHODS: We developed a system called FHIR-Ontop-OMOP to generate virtual clinical KGs from the OMOP relational databases. We leveraged an OMOP CDM-based Medical Information Mart for Intensive Care (MIMIC-III) data repository to evaluate the FHIR-Ontop-OMOP system in terms of the faithfulness of data transformation and the conformance of the generated CKGs to the FHIR RDF specification. RESULTS: A beta version of the system has been released. A total of more than 100 data element mappings from 11 OMOP CDM clinical data, health system and vocabulary tables were implemented in the system, covering 11 FHIR resources. The generated virtual CKG from MIMIC-III contains 46,520 instances of FHIR Patient, 716,595 instances of Condition, 1,063,525 instances of Procedure, 24,934,751 instances of MedicationStatement, 365,181,104 instances of Observations, and 4,779,672 instances of CodeableConcept. Patient counts identified by five pairs of SQL (over the MIMIC database) and SPARQL (over the virtual CKG) queries were identical, ensuring the faithfulness of the data transformation. Generated CKG in RDF triples for 100 patients were fully conformant with the FHIR RDF specification. CONCLUSION: The FHIR-Ontop-OMOP system can expose OMOP database as a FHIR-compliant RDF graph. It provides a meaningful use case demonstrating the potentials that can be enabled by the interoperability between FHIR and OMOP CDM. Generated clinical KGs in FHIR RDF provide a semantic foundation to enable explainable AI applications in healthcare.


Assuntos
Inteligência Artificial , Reconhecimento Automatizado de Padrão , Data Warehousing , Atenção à Saúde , Registros Eletrônicos de Saúde , Humanos
8.
BMC Med Genomics ; 15(1): 167, 2022 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-35907849

RESUMO

BACKGROUND: Next-generation sequencing provides comprehensive information about individuals' genetic makeup and is commonplace in precision oncology practice. Due to the heterogeneity of individual patient's disease conditions and treatment journeys, not all targeted therapies were initiated despite actionable mutations. To better understand and support the clinical decision-making process in precision oncology, there is a need to examine real-world associations between patients' genetic information and treatment choices. METHODS: To fill the gap of insufficient use of real-world data (RWD) in electronic health records (EHRs), we generated a single Resource Description Framework (RDF) resource, called PO2RDF (precision oncology to RDF), by integrating information regarding genes, variants, diseases, and drugs from genetic reports and EHRs. RESULTS: There are a total 2,309,014 triples contained in the PO2RDF. Among them, 32,815 triples are related to Gene, 34,695 triples are related to Variant, 8,787 triples are related to Disease, 26,154 triples are related to Drug. We performed two use case analyses to demonstrate the usability of the PO2RDF: (1) we examined real-world associations between EGFR mutations and targeted therapies to confirm existing knowledge and detect off-label use. (2) We examined differences in prognosis for lung cancer patients with/without TP53 mutations. CONCLUSIONS: In conclusion, our work proposed to use RDF to organize and distribute clinical RWD that is otherwise inaccessible externally. Our work serves as a pilot study that will lead to new clinical applications and could ultimately stimulate progress in the field of precision oncology.


Assuntos
Neoplasias , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Oncologia , Neoplasias/tratamento farmacológico , Neoplasias/genética , Projetos Piloto , Medicina de Precisão
9.
Stud Health Technol Inform ; 290: 173-177, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672994

RESUMO

Reproducibility is an important quality criterion for the secondary use of electronic health records (EHRs). However, multiple barriers to reproducibility are embedded in the heterogeneous EHR environment. These barriers include complex processes for collecting and organizing EHR data and dynamic multi-level interactions occurring during information use (e.g., inter-personal, inter-system, and cross-institutional). To ensure reproducible use of EHRs, we investigated four information quality dimensions and examine the implications for reproducibility based on a real-world EHR study. Four types of IQ measurements suggested that barriers to reproducibility occurred for all stages of secondary use of EHR data. We discussed our recommendations and emphasized the importance of promoting transparent, high-throughput, and accessible data infrastructures and implementation best practices (e.g., data quality assessment, reporting standard).


Assuntos
Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes
10.
J Med Internet Res ; 24(7): e38584, 2022 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-35658098

RESUMO

BACKGROUND: Multiple types of biomedical associations of knowledge graphs, including COVID-19-related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. OBJECTIVE: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model's performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. METHODS: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. RESULTS: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. CONCLUSIONS: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.


Assuntos
COVID-19 , Humanos , Conhecimento , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Curva ROC
11.
NPJ Digit Med ; 5(1): 77, 2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35701544

RESUMO

Computational drug repurposing methods adapt Artificial intelligence (AI) algorithms for the discovery of new applications of approved or investigational drugs. Among the heterogeneous datasets, electronic health records (EHRs) datasets provide rich longitudinal and pathophysiological data that facilitate the generation and validation of drug repurposing. Here, we present an appraisal of recently published research on computational drug repurposing utilizing the EHR. Thirty-three research articles, retrieved from Embase, Medline, Scopus, and Web of Science between January 2000 and January 2022, were included in the final review. Four themes, (1) publication venue, (2) data types and sources, (3) method for data processing and prediction, and (4) targeted disease, validation, and released tools were presented. The review summarized the contribution of EHR used in drug repurposing as well as revealed that the utilization is hindered by the validation, accessibility, and understanding of EHRs. These findings can support researchers in the utilization of medical data resources and the development of computational methods for drug repurposing.

12.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35649342

RESUMO

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.


Assuntos
Benchmarking , Desenvolvimento de Medicamentos , Algoritmos , Avaliação Pré-Clínica de Medicamentos , Reposicionamento de Medicamentos/métodos , Proteínas/genética
14.
J Rural Health ; 38(4): 908-915, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35261092

RESUMO

PURPOSE: Rural populations are disproportionately affected by the COVID-19 pandemic. We characterized urban-rural disparities in patient portal messaging utilization for COVID-19, and, of those who used the portal during its early stage in the Midwest. METHODS: We collected over 1 million portal messages generated by midwestern Mayo Clinic patients from February to August 2020. We analyzed patient-generated messages (PGMs) on COVID-19 by urban-rural locality and incorporated patients' sociodemographic factors into the analysis. FINDINGS: The urban-rural ratio of portal users, message senders, and COVID-19 message senders was 1.18, 1.31, and 1.79, indicating greater use among urban patients. The urban-rural ratio (1.69) of PGMs on COVID-19 was higher than that (1.43) of general PGMs. The urban-rural ratios of messaging were 1.72-1.85 for COVID-19-related care and 1.43-1.66 for other health care issues on COVID-19. Compared with urban patients, rural patients sent fewer messages for COVID-19 diagnosis and treatment but more messages for other reasons related to COVID-19-related health care (eg, isolation and anxiety). The frequent senders of COVID-19-related messages among rural patients were 40+ years old, women, married, and White. CONCLUSIONS: In this Midwest health system, rural patients were less likely to use patient online services during a pandemic and their reasons for its use differ from urban patients. Results suggest opportunities for increasing equity in rural patient engagement in patient portals (in particular, minority populations) for COVID-19. Public health intervention strategies could target reasons why rural patients might seek health care in a pandemic, such as social isolation and anxiety.


Assuntos
COVID-19 , Adulto , COVID-19/epidemiologia , Teste para COVID-19 , Feminino , Humanos , Pandemias , Participação do Paciente , População Rural
15.
JMIR Hum Factors ; 9(2): e35187, 2022 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-35171108

RESUMO

BACKGROUND: During the COVID-19 pandemic, patient portals and their message platforms allowed remote access to health care. Utilization patterns in patient messaging during the COVID-19 crisis have not been studied thoroughly. In this work, we propose characterizing patients and their use of asynchronous virtual care for COVID-19 via a retrospective analysis of patient portal messages. OBJECTIVE: This study aimed to perform a retrospective analysis of portal messages to probe asynchronous patient responses to the COVID-19 crisis. METHODS: We collected over 2 million patient-generated messages (PGMs) at Mayo Clinic during February 1 to August 31, 2020. We analyzed descriptive statistics on PGMs related to COVID-19 and incorporated patients' sociodemographic factors into the analysis. We analyzed the PGMs on COVID-19 in terms of COVID-19-related care (eg, COVID-19 symptom self-assessment and COVID-19 tests and results) and other health issues (eg, appointment cancellation, anxiety, and depression). RESULTS: The majority of PGMs on COVID-19 pertained to COVID-19 symptom self-assessment (42.50%) and COVID-19 tests and results (30.84%). The PGMs related to COVID-19 symptom self-assessment and COVID-19 test results had dynamic patterns and peaks similar to the newly confirmed cases in the United States and in Minnesota. The trend of PGMs related to COVID-19 care plans paralleled trends in newly hospitalized cases and deaths. After an initial peak in March, the PGMs on issues such as appointment cancellations and anxiety regarding COVID-19 displayed a declining trend. The majority of message senders were 30-64 years old, married, female, White, or urban residents. This majority was an even higher proportion among patients who sent portal messages on COVID-19. CONCLUSIONS: During the COVID-19 pandemic, patients increased portal messaging utilization to address health care issues about COVID-19 (in particular, symptom self-assessment and tests and results). Trends in message usage closely followed national trends in new cases and hospitalizations. There is a wide disparity for minority and rural populations in the use of PGMs for addressing the COVID-19 crisis.

16.
J Biomed Inform ; 127: 104002, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35077901

RESUMO

OBJECTIVE: The large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM. METHODS: We developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project. RESULTS: After the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM. CONCLUSION: We demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies.


Assuntos
COVID-19 , COVID-19/epidemiologia , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Pandemias , SARS-CoV-2
17.
J Am Med Inform Assoc ; 28(11): 2313-2324, 2021 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-34505903

RESUMO

OBJECTIVE: The study sought to test the feasibility of conducting a phenome-wide association study to characterize phenotypic abnormalities associated with individuals at high risk for lung cancer using electronic health records. MATERIALS AND METHODS: We used the beta release of the All of Us Researcher Workbench with clinical and survey data from a population of 225 000 subjects. We identified 3 cohorts of individuals at high risk to develop lung cancer based on (1) the 2013 U.S. Preventive Services Task Force criteria, (2) the long-term quitters of cigarette smoking criteria, and (3) the younger age of onset criteria. We applied the logistic regression analysis to identify the significant associations between individuals' phenotypes and their risk categories. We validated our findings against a lung cancer cohort from the same population and conducted an expert review to understand whether these associations are known or potentially novel. RESULTS: We found a total of 214 statistically significant associations (P < .05 with a Bonferroni correction and odds ratio > 1.5) enriched in the high-risk individuals from 3 cohorts, and 15 enriched in the low-risk individuals. Forty significant associations enriched in the high-risk individuals and 13 enriched in the low-risk individuals were validated in the cancer cohort. Expert review identified 15 potentially new associations enriched in the high-risk individuals. CONCLUSIONS: It is feasible to conduct a phenome-wide association study to characterize phenotypic abnormalities associated in high-risk individuals developing lung cancer using electronic health records. The All of Us Research Workbench is a promising resource for the research studies to evaluate and optimize lung cancer screening criteria.


Assuntos
Neoplasias Pulmonares , Saúde da População , Detecção Precoce de Câncer , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Humanos , Neoplasias Pulmonares/epidemiologia , Fenótipo
18.
AMIA Jt Summits Transl Sci Proc ; 2021: 410-419, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457156

RESUMO

HL7 Fast Healthcare Interoperability Resources (FHIR) is one of the current data standards for enabling electronic healthcare information exchange. Previous studies have shown that FHIR is capable of modeling both structured and unstructured data from electronic health records (EHRs). However, the capability of FHIR in enabling clinical data analytics has not been well investigated. The objective of the study is to demonstrate how FHIR-based representation of unstructured EHR data can be ported to deep learning models for text classification in clinical phenotyping. We leverage and extend the NLP2FHIR clinical data normalization pipeline and conduct a case study with two obesity datasets. We tested several deep learning-based text classifiers such as convolutional neural networks, gated recurrent unit, and text graph convolutional networks on both raw text and NLP2FHIR inputs. We found that the combination of NLP2FHIR input and text graph convolutional networks has the highest F1 score. Therefore, FHIR-based deep learning methods has the potential to be leveraged in supporting EHR phenotyping, making the phenotyping algorithms more portable across EHR systems and institutions.


Assuntos
Aprendizado Profundo , Algoritmos , Registros Eletrônicos de Saúde , Humanos , Obesidade , Projetos Piloto
19.
JMIR Med Inform ; 9(5): e23586, 2021 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-34032581

RESUMO

BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. METHODS: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic's electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. RESULTS: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer.

20.
Brief Bioinform ; 22(1): 568-580, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31885036

RESUMO

To enable modularization for network-based prediction, we conducted a review of known methods conducting the various subtasks corresponding to the creation of a drug-target prediction framework and associated benchmarking to determine the highest-performing approaches. Accordingly, our contributions are as follows: (i) from a network perspective, we benchmarked the association-mining performance of 32 distinct subnetwork permutations, arranging based on a comprehensive heterogeneous biomedical network derived from 12 repositories; (ii) from a methodological perspective, we identified the best prediction strategy based on a review of combinations of the components with off-the-shelf classification, inference methods and graph embedding methods. Our benchmarking strategy consisted of two series of experiments, totaling six distinct tasks from the two perspectives, to determine the best prediction. We demonstrated that the proposed method outperformed the existing network-based methods as well as how combinatorial networks and methodologies can influence the prediction. In addition, we conducted disease-specific prediction tasks for 20 distinct diseases and showed the reliability of the strategy in predicting 75 novel drug-target associations as shown by a validation utilizing DrugBank 5.1.0. In particular, we revealed a connection of the network topology with the biological explanations for predicting the diseases, 'Asthma' 'Hypertension', and 'Dementia'. The results of our benchmarking produced knowledge on a network-based prediction framework with the modularization of the feature selection and association prediction, which can be easily adapted and extended to other feature sources or machine learning algorithms as well as a performed baseline to comprehensively evaluate the utility of incorporating varying data sources.


Assuntos
Desenvolvimento de Medicamentos/métodos , Genômica/métodos , Asma/tratamento farmacológico , Demência/tratamento farmacológico , Humanos , Hipertensão/tratamento farmacológico , Terapia de Alvo Molecular/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA